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Evolutionary crystal structure prediction proved to be a powerful approach for studying a wide range of ma- 
terials. Here, we present a specifically designed algorithm for the prediction of the structure of complex crystals 
consisting of well-defined molecular units. The main feature of this new approach is that each unit is treated 
as a whole body, which drastically reduces the search space and improves the efficiency, but necessitates the 
introduction of new variation operators described here. To increase diversity of the population of structures, the 
initial population and part(~20%) of the new generations are produced using space group symmetry combined 
with random cell parameters and random positions and orientations of molecular units. We illustrate the effi- 
ciency and reliability of this approach by a number of tests (ice, ammonia, carbon dioxide, methane, benzene, 
glycine and butane- 1,4-diammonium dibromide). This approach easily predicts the crystal structure of methane 
A containing 21 methane molecules (105 atoms) per unit cell. We demonstrate that this new approach has also a 
high potential for the study of complex inorganic crystals as shown on examples of a complex hydrogen storage 
material Mg(BH4)2 and elemental boron. 



INTRODUCTION 

Structure is the most important piece of information about 
a material, as it determines most of its physical properties. 
It was long believed that crystal structures are fundamentally 
unpredictable |1, 2|. However, the situation began to change 
dramatically in the last decade. As the stable structure cor- 
responds to the global minimum of the free energy, several 
global optimization algorithms have been devised and used 
with some success for crystal structure prediction (CSP) for 
instance, simulated annealing |3]|4], metadynamics |5|, evolu- 
tionary algorithms |6 , 7 1, random sampling |8 1, basin hopping 
||9 1, minima hopping 1 10|, and data mining 1 1 1 1. For inorganic 
crystals, in many cases it is already now possible to predict the 
stable structure at arbitrary external pressure. Towards the am- 
bition of designing novel materials prior to their synthesis in 
the laboratory, reliable and efficient prediction of the structure 
of more complex (in particular, molecular) crystals becomes 
imperative. 

Molecular crystals are extremely interesting because of 
their applications as pharmaceuticals, pigments, explosives, 
and metal-organic frameworks lfT2l[T3l . The periodically con- 
ducted blind tests of organic crystal structure prediction, or- 
ganized by Cambridge Crystallographic Data Centre (CCDC) 
have been the focal point for this community and they reflect 
steady progress in the field 1 14- 18 1. The tests show that it is 
now possible to predict the packing of a small number of rigid 
molecules, provided there are cheap force fields accurately de- 
scribing the intermolecular interactions. In these cases, effi- 
ciency of search for the global minimum on the energy land- 
scape is not crucial. However, if one has to use expensive ab 
initio total energy calculations or study systems with a large 
number of degrees of freedom (many molecules, especially if 
they have conformational flexibility), the number of possible 



structures becomes astronomically large and efficient search 
techniques become critically important. 

In addition, the nature of weak chemical interactions makes 
it common that a molecules have a wide variety of ways of 
packing with lattice energies within a few kJ/molecule of the 
most stable structure. Thus prediction of such large structures 
is certainly a challenge, especially if the number of trial struc- 
tures has to be kept low to enable practical ab initio structure 
predictions. Recent pioneering works fT9'-'2Tl, in particular, 
using metadynamics |20 | offer inspiring examples of this. 

Compared to other methods, evolutionary algorithms have 
special advantages. Exploring the energy surface, such al- 
gorithms arrive at the global minimum by a series of in- 
telligently designed moves, involving self-learning and self- 
improvement of the population of crystal structures |7 |. Our 
USPEX (Universal Structure Predictor: Evolutionary Xtallog- 
raphy) code 161 l22lj25l . proved to be extremely efficient and 
reliable for atomic crystals, and here we present an extension 
of this algorithm to complex crystals made of well-defined 
units. In the following sections, we will mainly discuss molec- 
ular crystals. Crystals containing complex ions and clusters 
can be equally well studied using the methodology developed 
here, as we show by two tests on challenging systems. 



METHODOLOGY 

Compared to the prediction of atomic structures, there are 
several additional considerations to be taken into account for 
molecular crystals: 

i) A typical unit cell contains many more atoms than a usual 
inorganic structure, which means an explosion of computing 
costs if all of these atoms are treated independently; 

ii) Molecules are bound by weak forces, such as the van der 
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Waals (vdW) interactions, and the inter-molecular distances 
are typically larger than those in atomic crystals, which leads 
to the availability of large empty space; 

iii) Most of the molecular compounds are thermodynam- 
ically less stable than simpler molecular compounds from 
which they can be obtained (such as H2O, CO2, CH4, NH3). 
This means that a fully unconstrained global optimization ap- 
proach in many cases will produce a mixture of these simple 
molecules, which are of little interest to the organic chemist. 
To study the packing of the actual molecules of interest, it is 
necessary to fix the intramolecular connectivity; 

iv) Crystal structures tend to be symmetric, and the distri- 
bution of structures over symmetry groups is extremely un- 
even I26i . For example, 35% of inorganic and 45% of organic 
materials have the point group 2/m. Compared to inorganic 
crystals, there is a stronger preference of organic crystals to a 
smaller number of space groups. Over 80% of organic crys- 
tals are found to possess space groups: P2i/c (36.59%), P-l 
(16.92%), P2i2i2i (11.00%), C2/c (6.95%), P2i (6.35%) and 
Pbca (4.24%) [|27||. 

The first two points indicate that the search space is huge. 
If we start to search for the global minimum with randomly 
generated structures, it is very likely that most of the time 
will be spent on exploring uninteresting disordered structures 
far away from the global minimum. Fortunately, the last 
two points suggest a way to improve the efficiency. The 
point (iii) implies that the true thermodynamic ground state 
corresponding to most organic compositions is a mixture of 
simpler molecules, which is of little interest to the organic 
chemists. The truly interesting problem, packing of the pre- 
formed molecules, can be solved by constrained global opti- 
mization finding the most stable packing of molecules with 
fixed bond connectivity. This will not only make the global 
optimization process meaningful, but in the same time will 
simplify it, leading to a drastic reduction of the number of de- 
grees of freedom and of the search space. Structure prediction 
(global optimization) must involve relaxation (local optimiza- 
tion) of all structures, and fixing intra-molecular bond con- 
nectivity has an added benefit of making structure relaxations 
cheaper and more robust. Depending on their chemical nature, 
those molecules shall be treated as fully or partly rigid bod- 
ies during the action of evolutionary variation operators and 
local optimization. Another improvement of the efficiency 
is achieved by using symmetry in the random generation of 
new structures - a population of symmetric structures is usu- 
ally more diverse than a set of fully random (often disordered) 
structures. Diversity of the population of structures is essen- 
tial for the success and efficiency of evolutionary simulations. 

We have successfully implemented the adapted evolution- 
ary algorithm in the USPEX code. Briefly, our procedure is as 
follows (as shown in Fig. [T]i. 

1) The initial structures are usually generated randomly, 
with randomly selected space groups. First, we randomly pick 
one of 230 space groups, and set up a Bravais cell according 
to the prespecified initial volume with random cell parame- 
ters consistent with the space group. Then one molecule is 
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FIG. 1 . Illustration of the constrained evolutionary algorithm. 



randomly placed on a general Wyckoff position, and is multi- 
plied by space group operations. If two or more symmetry- 
related molecules are found close to each other, we merge 
them into one molecule that sits on a special Wyckoff po- 
sition and has averaged coordinates of the molecular center 
and averaged orientational vectors (or random, when averag- 
ing gives zero). Adding new molecular sites one by one, until 
the correct number of molecules is reached, we get a random 
symmetric structure. During this process, we also make sure 
that no molecules overlap or sit too close to each other. All 
produced unit cell shapes are checked and, if necessary, trans- 
formed to maximally orthogonal shapes 1231 . 

2) Structural relaxation is done stepwise from low to high 
precision. At the initial stages, we employ the SIESTA code 
|28| for first-principles simulations, which allows the con- 
strained geometry relaxation. As an option, we can use 
the DMACRYS code |^ for classical calculations. We 
note that SIESTA provides Z-matrix representation for the 
molecules lf30l . enabling the specification of the molecular ge- 
ometry and its internal degrees of freedom (important when 
dealing with conformationally flexible molecules). For the fi- 
nal stages of relaxation, we can keep the molecules fully or 
partly rigid, or allow their complete relaxation (in the latter 
case, such codes as GULP i31] and VASP [32 1 are also sup- 
ported in USPEX). It is a good strategy to relax the structures 
in SIESTA with constrained molecular geometry at the begin- 
ning stage and then fully relax them using VASP, and here we 
adopt this strategy for all studied systems. It is well known 
that DFT within local and semilocal approximations, such as 
the LDA or GGA, cannot describe vdW dispersion interac- 
tions well (e.g., II33I ). and we therefore used the GGAh-D ap- 
proach that includes a damped dispersion correction |34|; this 
approach is known to work well for molecular crystals. 

3) At the end of each generation, all structures in the gen- 
eration are compared using their fingerprints [24] and all non- 
identical structures are ranked by their (free) energies or (if 
the calculation is done at T = K, as we do here) enthalpies. 
There is an important technical aspect: intramolecular con- 
tributions are identical for all different packings of the same 
molecule and thus decrease the discriminatory power of the 
fingerprint function. Therefore, we remove the intramolecu- 
lar distances from the computation of the fingerprint function 
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FIG. 2. Illustration of the variation operators: a) heredity; b) coordinate mutation; c) rotational mutation. 



when dealing with molecular crystals. 

A certain percentage of higher-energy structures in the pop- 
ulation are discarded, and the rest participate in creating the 
next generation using variation operators detailed below. 

To ensure properly constrained global optimization, we not 
only generate the structures made of the desired molecules, 
but also check that the bond connectivity has not changed af- 
ter relaxation - structures with altered connectivity graphs are 
discarded. 

4) Child structures (new generation) are produced from par- 
ent structures (old generation) using one of the following vari- 
ation operators: (i) heredity, (ii) permutation, (iii) coordinate 
mutation are the same as in atomic crystal structures 161 [25l . 
with the only difference that variation operators act on the ge- 
ometric centers of the molecules and their orientations, i.e. 
whole molecules, rather than single atoms, are considered as 
the minimum building blocks. Since molecules cannot be con- 
sidered as spherically symmetric point particles, additional 
variation operators must be introduced: (iv) rotational muta- 
tion of the whole molecules, (v) softmutation - a hybrid oper- 
ator of coordinate and rotational mutation. Fig. |2]shows how 
variation operators work in our algorithm. Below we describe 
how these variation operators were used in our tests. 

Heredity: This operator cuts planar slices from each indi- 
vidual and combines these to produce a child structure. In 
heredity, each molecule is represented by its geometric center 
(Fig. [2^) and orientation. From each parent, we cut (paral- 
lel to a randomly selected coordinate plane of the unit cell) a 
slab of random thickness (within bounds of 0.25-0.75 of the 
cut lattice vector) at a random height in the cell. If the total 
number of molecules of each type obtained from combining 
the slabs does not match the desired number of molecules, a 
corrector step is performed: molecules in excess are removed 
while molecules in shortage are added; molecules with higher 
local degree of order have higher probability to be added and 



lower probability to be removed. This is equivalent to our 
original implementation of heredity for atomic crystals 16J 

Permutation: this operator swaps chemical identity in ran- 
domly selected pairs of molecules. 

Coordinate mutation: All the centers of molecules are dis- 
placed in random directions, the distance for this movement 
for molecule / being picked from a zero-mean Gaussian dis- 
tribution with a defined as: 



(1) 



where 11 is the local order of the molecule. Thus molecules 
with higher order are perturbed less than molecules with low 
order (Fig. [2]3). We calculate the local order parameter of 
each molecule from its fingerprint lf24l . in the computation 
of which only centers of all molecules are used. In the tests 
described here CTmax represents the order of a typical inter- 
molecular distance. 

Rotational mutation: A certain number of randomly se- 
lected molecules are rotated by random angles (Fig. [2]:). For 
rigid molecules, there are only 3 varibles to define the orien- 
tation of the molecules. For flexible molecules, we also al- 
low the mutation of torsional angles of the flexible groups. A 
large rotation can have a marked effect on global optimization, 
helping the system to jump out of the current local minimum 
and find optimal oiientational ordering. 

Softmutation: This powerful operator, introduced first for 
atomic crystals |25|, involves atomic displacements along the 
softest mode eigenvectors, or a random linear combination 
of softest eigenvectors. In the context of molecular crystals 
one must operate with rigid-unit modes and this operator be- 
comes a hybrid operator, combining rotational and coordinate 
mutations. In this case, the eigenvectors are calculated first, 
and then projected onto translational and rotational degrees of 
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freedom of each molecule and the resulting changes of molec- 
ular positions and orientations are applied preserving rigidity 
of the fixed intramolecular degrees of freedom. To calculate 
efficiently the normal modes, we construct the dynamical ma- 
trix from bond hardness coefficients |25 1. The same structures 
can be softmutated many times, each time along the eigenvec- 
tor of a new mode. 

At the end of the selection, the best individuals in the last 
generation (usually up to 5) are kept. To maintain diversity 
of the population, some fraction (usually 15% - 30%) of pop- 
ulation is randomly generated with symmetry. While simple 
random generation does not improve diversity, the use of sym- 
metry does allow a diverse set of structures to be produced. 

5) The simulation is stopped once a predefined halting cri- 
terion is met. The lowest-energy structures found in USPEX 
are then carefully relaxed with higher precision using the same 
level of theory: the all-electron projector-augmented wave 
(PAW) method IS), as implemented in the VASP code ll32l . 
at the level of generalized gradient approximation (GGA) ll36l 
for inorganic systems; or dispersion-corrected GGAh-D f34\ 
approximation for organic crystals. We used the plane wave 
kinetic energy cutoff of 550 eV and the Brillouin zone was 
sampled with a resolution of 2tt x 0.07 A^^, which showed 
excellent convergence of the energy differences, stress tensors 
and structural parameters. 



TESTS AND APPLICATIONS 

Here we discuss tests of USPEX on system with well- 
known stable phases and also show how our method finds 
hitherto unknown structures. The test cases (including ice, 
methane, ammonia, carbon dioxide, benzene, glycine and 
butane- 1 ,4-diammonium dibromide, etc) cover a wide range 
of systems with different molecular shapes (tetrahedral, lin- 
ear, bent, planar and small biomolecules), and chemical inter- 
actions (vdW dispersion, ionic, covalent and metallic bond- 
ing, strong/weak hydrogen bonding, tt-tt stacking, organic 
and inorganic molecular systems, etc). All the calculations 
discussed below were performed in the framework of DFT or 
DFTh-D. Driven by the USPEX code, the structures were ini- 
tially relaxed in SIESTA with constrained molecular geometry 
and then fully relaxed in VASP at the final stages. 



Ice 

Ice (H2O) is an archetypal hydrogen-bonded molecular 
crystal. The orientations of hydrogen bonds locally obey the 
well-known ice rules, that is, each oxygen atom is tetrahe- 
drally bonded to four hydrogen atoms, by two strong covalent 
intra-molecular bonds and two much weaker inter-molecular 
bonds (hydrogen bonds). Given the enormous number of pos- 
sibilities of placing and orienting (even under ice rules) wa- 
ter molecules, the prediction of the ice structure is a complex 



task: according to Maddox(1988), it is still thought to lie be- 
yond the mortals' ken. 

The normal crystalline form of ice, ice I/,, is disordered 
and has hexagonal symmetry, with oxygen atoms arranged in 
a hexagonal diamond motif (a cubic diamond-type ice I^. is 
also known experimentally) with randomly oriented hydrogen 
bonds. In experiment 1371 [38l . ice XI (the ordered version of 
ice I/,), found to be the most stable polymorph at 1 atm and low 
temperatures, but the transformation from disordered ice I/, to 
ordered ice XI is kinetically hindered, and this is why special 
approaches are needed for experimental preparation of ice XI 

EH. 

With variable-cell USPEX simulations for a 4-molecules 
cell at 1 atm, we indeed identify ice XI as the most stable 
polymorph (Fig. [3^). This structure was found within just 4 
generations, after relaxing ~^160 structures. Fig.|4]shows how 
the lowest energy changed from generation to generation in 
our calculation. This purely quantum-mechanical calculation 
required less than 1 day on 8 cores of a Dell XPS desktop PC. 
Apart from ice XI, we found several remarkable structures in 
the same run. 

An ordered version of ice 1^ 1391 . a tetragonal phase with 
a cubic diamond-type oxygen sublattice (Fig. |3j)), was found 
to be energetically competitive with ice XI. At both the GGA 
and GGAh-D levels of theory, its energy is only 2 meV per 
molecule above that of ice XI. We have also found an inter- 
esting low-energy metastable polymorph (Fig. |3j;), where the 
oxygen sublattice has topology of the hypothetical bct4 al- 
lotrope of carbon ||40ll4n . The bct4-like structure of ice was 
also found from molecular dynamics simulation of the water's 
adsorption on the surface of hydroxylated /3-cristobalite |42|. 
Proton ordering lowers its symmetry from lAlmmm to Cm. 

Methane 

Methane, the simplest of saturated hydrocarbons, is an im- 
portant constituent of gas-giant planets Uranus and Neptune 
P3| . The high-pressure behavior of methane is of extreme 
importance for fundamental chemistry, as well as for under- 
standing the physics and chemistry of planetary interiors. 

The tetrahedral CH4 molecules are interacting practically 
only by vdW dispersion forces with each other. In spite of 
the simplicity of the molecule, the phase diagram of methane 
is still not well established fl4l - l47l . Different experiments on 
methane were conducted during the last few decades, resulting 
in a complex phase diagram drawn by Bini and Pratesi |44|. 
Of the nine solid phases in the diagram, only the structures 
of phases I, II, and III have been determined, while phases 11, 
III, IV, V, and VI, only exist below 150 K and at moderate 
pressures. CH4 is expected to become chemically unstable 
and decompose at megabar pressures l48l . 

The high-pressure phases of solid methane above 5 GPa 
have been the subject of numerous experimental and theoret- 
ical studies, however, the understanding are still incomplete. 
Bini and Pratesi |44| based on IR and Raman data, proposed 
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FIG. 3. Ice polymorphs at 1 atm found by USPEX. a) Ice XI, derived from ice I;,, space group Cmc2\, a=4.338 A, b=7.554 A, c=7.094 
A, 01(0,0.6651,0.0623), 02(0.5,0.8328,-0.0622), Hl(0,0.6638,0.2045), H2(0,0.5373,-0.019I), H3(0.6865,-0.2329,-0.0140); b) tetragonal 
phase, derived from ice I,., space group /4imd, a=4.415 A, c=6.008 A, 0(0,0.5,0.0006), H(0.I839,0.5,0.I000); c) bct-4 like ice, space 
group Cm, a=4.472 A, b=10.451 A, c=5.744 A, ^=111.3°, 01(0,0,0), 02(0.3691,0,0.7197), 03(0.6647,0.3192,0.3605), H1(0.7683,0,0.887I), 
H2(0.1317,0,0,8908), H3(0.4705, 0.2691, 0.3567), H4(0.0923,0.8787,0.2133), H5(0.30I8,0.0751,0.6030). 
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FIG. 4. Prediction of the crystal structure of ice at GPa. The lowest 
energy at each generation is shown relative to the ground state. Each 
generation contains 30 structures. The ground state structure ice XI 
was found at 4th generation. We also found cubic ice and bct4-like 
ice at the same calculation. 

a tetragonal crystal structure for the phase A, while high- 
pressure X-ray powder diffraction experiments suggested that 
the unit cell contains 21 molecules in the pseudocubic rhom- 
bohedral unit cell ll46ll . 

We performed structure prediction simulation for CH4 us- 
ing experimental cell parameters |47] (a=8.36 A at 11 GPa). 
We indeed found the best structure to possess a rhombohedral 
symmetry, and this structure was found within 8 generations, 
and is characterized by the icosahedral packing of methane 
molecules, and this packing fully explains the unusual num- 
ber 27 of molecules in the cell: 1 molecule is located in the 
center of the unit cell, 12 molecules around it form an icosa- 



hedron, and the remaining & molecules are located above the 
triangular faces of the icosahedron (Fig. [5]). A rhombohedral 
model, very similar to ours, was recently proposed, on the ba- 
sis of neutron scattering experiments |45|: the only difference 
is that our model has orientationally disordered molecules (as 
is also most likely to be the case in reality: furthermore, this 
model has a lower energy), while Ref. 1451 assumed orienta- 
tionally ordered molecules. The essential icosahedral charac- 
ter of the structure was not mentioned by Ref. B3]| . but can 
be clearly seen on close inspection of their results. 

Ammonia 

Bonding in NH3 is intermediate between hydrogen bonded 
tetrahedral structure of H2O, and vdW-bonded close-packed 
structure of CH4. Weak hydrogen bonding between neigh- 
boring ammonia molecules results in a pseudo-close-packed 
arrangement in the solid state B9l . It is extremely interesting 
to understand the nature of hydrogen bonding in crystalline 
ammonia, and the properties of ammonia under pressure are 
of fundamental interest, as compressed ammonia has a signif- 
icant role in planetary physics 1 43 1. 

Bonding in NH3 is intermediate between hydrogen bonded 
tetrahedral structure of H2O, and vdW-bonded close-packed 
structure of CH4. Weak hydrogen bonding between neigh- 
boring ammonia molecules results in a pseudo-close-packed 
arrangement in the solid state |49|. It is extremely interesting 
to understand the nature of hydrogen bonding in crystalline 
ammonia; properties of ammonia under pressure are of funda- 
mental interest, as compressed ammonia has a significant role 
in planetary physics Ii43 J . 
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FIG. 5. Structures of methane: (a) illustration of possible sites around the icosahedron, (b)21 -molecule rhombohedral methane, with Fl, F2, 
F3, F4 sites occupied; (c) view of the icosahedral packing in the rhombohedral methane (space group: R-3); Two C sublattices are marked by 
different colors in 21-molecule cell (outside icosahedral, green; within icosahedral, grey) 



At room temperature, ammonia crystallizes at 1 GPa in a 
rotationally disordered, face-centered-cubic phase (phase III) 
149 - 51 1 . X-ray and neutron studies have yielded informa- 
tion about the equation of state and structures of solid ammo- 
nia. The low-P, T phase I of ammonia undergoes a first-order 
phase transition into phase IV at about 3-4 GPa and then into 
phase V at about 14 GPa. Phase I has a cubic structure with 
space group P2i3, while phase IV has been identified as the 
orthorhombic structure with space group P2i2i2i. Phase V 
might have the same space group as phase IV (an isosymmet- 
ric phase transition). 

We carried out variable-cell structure prediction calcula- 
tions at 5, 10, 25, 50 GPa. At low pressures (5 GPa), we found 
that the P2i3 structure to be stable (Fig. [6^), in good agree- 
ment with the experiment. At high pressures, USPEX with- 
out applying symmetry in the initialization still easily found 
P2i3 structure, however, failed to get the ground state struc- 
ture P2i2i2i phase in a simulation with up to 20 generations. 
The energies of whole-molecule rotation are very small com- 
pared to intra-molecular bonding energies, thus making the 
process of finding correct molecular orientations extremely 
difficult. This indicates that the energy landscape of ammonia 
is actually very flat. To enhance the searching efficiency, we 
initialized the first generation using random symmetric struc- 
tures. And to retain diversity of the population, 30% of each 
new generation was produced by random symmetric mecha- 
nism. In this case, the ground state structure (Fig. [6};) ap- 
peared within 6 generations (^210 structural relaxations). In 
addition, we also found the Pljc phase (Fig. [Sj?) reported 
before ll52l . 



Carbon dioxide 

The CO2 molecule has a special significance because it is 
very abundant in nature and is a model system involving tt 



bonding and sp-hybridization of carbon atoms. Similar to 
methane, carbon dioxide is a vdW crystal with strong (weak) 
intra-molecular (inter-molecular) interactions at low pressures 
II53I . At room temperature and 1.5 GPa, CO2 crystallizes as 
dry ice, with a cubic Pai structure. At pressures between 12 
and 20 GPa, CO2-I transforms to the orthorhombic CO2-III 
ll54H56l . According to flie theoretical calculation, C02-(III) 
is metastable, while C02-(II) with the PA2lmnm symmetry is 
believed to be thermodynamically stable [59] It is known that 
above 20 GPa a non-molecular phase (called phase V) with 
tetrahedrally coordinated carbon atoms becomes stable 153). 

In the previous prediction [57J, unconstrained USPEX cal- 
culations succeeded in finding the correct CO2 structures in 
a wide pressure range. By applying molecular constraint, 
we have found the PA2lmnm phase (Fig. |7]i quicker, just 
in two generations (^80 structural relaxations). PA2lmnm 
phase remains the most stable structure made of discrete 
CO2 molecules at least up to 80 GPa. Both experiment [|58l 
and theory ll57l l59l show that CO2 polymerizes above 20 
GPa, while the molecular form {PA2lmnm phase) exists as 
a metastable form at low temperatures and higher pressures. 
This examples shows how imposing constraints gives the most 
stable molecular form, while unconstrained search finds the 
global minimum (which for CO2 is non-molecular above 20 
GPa). Both cases correspond to situations that are experimen- 
tally achievable, and thus important. 

Benzene 

Benzene is the simplest aromatic compound, and it has a 
purely planar molecule, the packing of which is stabilized by 
TT-TT interactions. The crystal structure of benzene is one of the 
most basic and most actively investigated structures. The first 
proposed phase diagram was very complex and contained six 
solid phases |60|. However, recent experimental studies sim- 
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FIG. 6. Crystal structures of ammonia, a) P2i3 phase (stable at 1 - 6 GPa, Z=4); b) P2i/c phase (stable at 6 - 8.5 GPa, Z=4); c) P2i2i2i 
phase (stable at 8.5 - 60 GPa, Z=4) 




FIG. 7. Crystal structure of CO2 II (space group: P42/mnm, Z=2). 



plified it Il6ni62l . At normal conditions, benzene crystallizes 
in the orthorhombic phase I (Pbca). A monoclinic phase II 
(Plilc), with two molecules per unit cell was idenfied above 
1.75 GPa. Phase II is stable up to the onset of a chemical 
reactions (at 41 GPa and 298 K). 

In our simulation we started with the same emperical po- 
tential [;63 | as used in a recent metadynamics study |20|, and 
we reproduced the multiple phases of benzene found there 
and coiTesponding to the old phase diagram. This potential 
was calibrated at normal conditions and may fail at high pres- 
sure. Its predicted many stable phases at different pressures 
(this is consistent with the old phase diagram, but most of 
these phases should be metastable according to the new exper- 
iment). To remedy this, we repeated our structure prediction 
runs at the level of DFTh-D. We performed the calculation at 0, 
5, 10, 25 GPa with Z=4. In our simulation, the experimentally 



observed orthorombic phase (Pbca) was identified as the most 
stable phase at GPa, and then it transforms to PA^lil phase 
at 4 GPa. We also found the monoclinic phase (f 2i/c) (Fig. 
|8]l as the ground state above 7 GPa. Our DFTh-D results give 
fewer stable phases, in agreement with the new phase diagram 
- the only difference is in the PA^lil phase. This phase is ex- 
perimentally known, and according to the latest experimental 
results pM^ '621 is metastable. Previous DFT calculation f64l 
suggested this phase to be stable at pressures 4-7 GPa, which 
is consistent with our results. Thermodynamic stability of the 
f 4321 2 phase needs to be revisited experimentally. 



Glycine 

Glycine, with the formula NH2CH2COOH, is the smallest 
of 20 aminoacids commonly found in proteins. Aminoacids 
are important in nutrition and widely used in the pharmaceu- 
tical industry. 

The polymorphism of glycine was intensely studied ||65l - 
l71i . Glycine is known to crystallize in four polymorphs with 
space groups P2\lc, P2\, P'i2 and P2ilc which are labeled 
a, /3, 7 and a, respectively |65l. The a, /?, and 7 phases 
are found at ambient pressure, with a and [3 phases being 
metastable with respect to the 7 phase, a glycine has recently 
been found to form under pressure |67|. In the gas phase, 
glycine is in a nonionic form, while in all four of the crystal 
structures glycine is zwitterionic (as shown in Fig. |9^). In this 
form, an -NH3+ group on one ion electrostatically interacts 
with a -COO^ group on a neighboring ion. Although zwit- 
terionization causes an increase in energy with respect to the 
gas-phase molecule, it is thought that the zwitterionic crystals 
are stabilized by the increase in number of hydrogen bonds 
that can be formed in comparison to the number that would be 
formed in the nonionic case. 

Since the glycine zwitterion only has the point symme- 
try Ci (i.e. no symmetry), structure prediction of glycine 
is more challenging compared with benzene. We performed 
variable cell prediction at 1 GPa with 2-4 molecules per cell. 







7^ 




(a) 



(b) 



(=) 



FIG. 8. Crystal structures of benzene (a)orthorhombic phase I {Pbca, Z=4); (b)tetragonal phase II {PAz2i2, Z=4); (c) monoclinic phase 
{Plilc, Z=2). 



Without any experimental information, we found /3-glycine 
(Fig. l9b) as a metastable structure with Z = 2; and 7-glycine 
(Fig. |9}l) as the best structure with Z = 3. We also found 
a-glycine as a metastable form in the calculation with Z = 4 
(Fig. ^p) at 2.0 GPa. This shows the power of our evolu- 
tionary search method. However, GGA+D results show that 
a glycine possesses the lowest enthalpy, while 7 and /? phase 
are 20 meV/molecule and 30 meV/molecule higher, respec- 
tively. Yet, the experimental results demonstrated the relative 
thermodynamic stability to be 7 > a > /?. This shows the 
need for better ways of computing intermolecular interaction 
energies. 



Butane-l,4-diaminonium dibromide 



rounded by four -NH3+ groups. During the process of ro- 
tational mutation, both the orientation of the whole molecular 
group and its flexible torsional angles are allowed to change. 
A large fraction of rotation (^ 40%) of the molecules is found 
to greatly speed up the prediction. This success confirmed 
that our constrained evolutionary algorithm can be straight- 
forwardly adapted to deal with flexible molecules. 



Inorganic crystals 

Apart from molecular crystals, this new approach is also 
applicable to inorganic crystals with complex ions or clusters. 
Below are a few illustrations. 



The molecules we discussed so far are rigid or nearly rigid. 
Is it possible to use this approach to study the packing of flex- 
ible molecules? To investigate this, we applied it to the pre- 
diction of crystal structure of butane- 1,4-diammonium dibro- 
mide, in which Br^ and C4Hi4N2^+ can be described as two 
molecular units that form the structure. 

By using the experimental cell parameters ll72ll . we indeed 
observed numerous structures with different conformations 
of the C4Hi4N2^+ molecular ion. USPEX firstly found the 
energetically favorable conformation, and then identified the 
ground state structure at the 12th generation (about 500 struc- 
tural relaxations): P2i/c butane- 1,4-diammonium dibromide. 



In this structure, as shown in Fig. 10 the organic hydrocar- 
bon chains are found to pack in a herringbone-type stacking 
with hydrogen bonds to the Br^. Each Br^ anion is sur- 



Complex ionic solids: example of hydrogen storage materials 

Reversible hydrogen storage materials recently attracted 
great interest 1731 . Two groups of complex metal hydrides: 
alumohydrides containing AIH4 groups and borohydrides 
with BH4 groups have been recently under intensive study 
Il74l477l . Numerous dehydriding and rehydriding processes 
have been predicted theoretically and tested experimentally. 
In a good candidate material, dehydridation should happen at 
acceptably low temperatures. Structure prediction for such 
systems can guide the experimentalists to synthesize the de- 
sired compounds in the laboratory. 

The crystal structure of Mg(BH4)2 has been extensively 
investigated. It was experimentally solved, and found to 
be extremely complex (330 atoms per unit cell for the low- 




FIG. 9. Glycine polymorphs found by USPEX. a) representation of glycine zwitterion; b) a-glycine at 2 GPa(Z=4, a=5.390 A, b=5.91 1 A, 
c=10.189 A, /3=1 13.2 °); c) /3-glycine at 0.4 GPa(Z=2, a=5.372 A, b=6.180 A, c=5.143 A, /J=l 11.9 °); d) 7-glycine at 1 GPa(Z=3, a=b=7.070 
A, c=5.490 A). 



temperature phase with P6i symmetry f74l). Recent theo- 
retical work then predicted a new body-centered tetragonal 
phase (with /-4m2 symmetry), which has slightly lower en- 
ergy than P6i phase; it was found using the prototype electro- 
static ground-state approach (PEGS) |75 1. Later, based on the 
prototype structure of Zr(BH4)4, another orthorhombic phase 
with F222 symmetry was found to have even lower energy 
than all previously proposed structures [771 . 



In general, the previous theoretical discoveries of novel 
Mg(BH4)2 phases were conducted either by ad hoc extensive 
searching or by chemical intuition. However, our evolutionary 
algorithm does not rely on any prior knowledge except chem- 
ical composition, and could be particularly useful for predict- 
ing stable crystal structures for these complex metal hydride 
systems. If we consider the BH4^ ion as a molecular group, 
the search space would be dramatically reduced. Within 10 
generations (^400 structure relaxations), USPEX found the 
F222 phase (Fig. [TT^ ) as the most stable structure at ambi- 
ent pressure. In addition, I-Am2 (Fig. [TTJd) was also found by 
USPEX in the same calculation, with enthalpy less than 1.2 
meV/atom above that of F222 phase. Compared to the pre- 
vious work, our method is clearly more universal and robust, 
enables efficient structure prediction for complex molecular 
systems, both organic and inorganic. 



Cluster-based crystals: example of elemental boron 



Boron, located in a unique position of Periodic Table, is an 
element of chemical complexity due to the subtle balance be- 
tween localized and delocalized electronic states. All known 
structures of boron contain icosahedral B12 clusters. Recent 
experiment |78 | found a new phase of pure boron (7-B28) at 
pressures above 10-12 GPa, and its structure was solved using 
USPEX with fixed experimental cell dimensions |78|. Sur- 
prisingly, 7-B28 showed different chemistry compared with 
all the other elemental boron allotropes. In the 7-B28 struc- 
ture (Fig. [T2J3), the centers of the B12 icosahedra form a dis- 
torted cubic close packing as in a-Bi2 (Fig. 12 1); but with all 
octahedral voids are occupied by B2 pairs. The 7-B28 struc- 



ture resembles a NaCl-type structure, with the B12 icosahedra 
and B2 pairs as 'anions' and 'cations'. Finding this structure 
without fixing cell parameters was reported to be exceedingly 
difficult 179:1, but latest methodological developments enable 
this (Lyakhov et at., unpublished). However, the problem can 
be made very simple if we recall that all boron phases con- 
tain B12 icosahedra. Here we treated B12 icosahedral and B2 
pairs as separate rigid units, and performed structure predic- 
tion runs at different numbers of B12 and B2 units (2:1, 1:1, 
2:2, 2:4, etc) at ambient conditions. We could easily find 7- 
B28 within 2-3 generations or ^100 structural relaxations. 
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(a) 



(b) 



FIG. 10. Butane- 1 ,4-diamniomum dibromide polymorph found by USPEX (space group: Plilc, Z=2). a), representation of network, view 
from a axis; b) Br~ coordination environment, view from b axis. For clarity, hydrogen atoms are not shown in b. The C4Hi4N2^''' molecular 
ion has 6 flexible angles, and the unit cell of the stable polymorph contains 44 atoms. 











• • 
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• • 

^ • • ^ * J» -J 




(a) 



(b) 



FIG. II. Mg(BH4)2 polymorphs found by USPEX. a) F222 phase; b) /4i22 phase. 



Meanwhile, we observed a set of low-energy and chemically 
interesting structures with different proportions of B12 and 
B2. For instance, the novel metallic phase B52 with the Pnn2 



symmetry (Fig. 12 ;) was calculated to be only 12 meV/atom 
higher in energy than 7-B28 at atmosphere pressure. Its en- 
ergy is lower than those of the experimentally observed phases 
(such as the T-50 phase [80 1) and this example shows that 
our method can be used for even non-molecular and inorganic 
solids that contain clusters or complex ions. 



DISCUSSION AND CONCLUSIONS 



In the original version of USPEX |6|, the stable crystal 
structure was assembled from individual atoms, which was 
also shown to work well for atomic crystals and also for 
simple molecular systems (like carbon dioxide, water, urea). 
However, it is clear that for molecular crystals improvements 
of the efficiency can be made if the structure is assembled 
from whole molecules rather than individual atoms. This is 
confirmed by the present study. Our constrained global opti- 
mization method allows one to find the stable crystal structure 
of a given molecular compound, and provides a set of low- 
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(a) (b) (c) 



FIG. 12. Crystal structures of boron, a) Q-B12; b) 7-B28; c) novel metastable B52 phase, space group Pnn2, a=8.868 A, b=8.777 A, c=5.000 
A. B 1(0.5777,0.7728,0), B2(0.9187,0.7321,0.3222), B3(0.7464,0.7488,0.4978), B4(0.5902,0.6690,0.3087), B5(0.6243,0.8678,0.3055), 
B6(0.8209,0.9090,0.3202), B7(0.8698,0.6288,0.0229), B8(0.7835,0.5798,0.3273), B9(0.6684,0.5795,0.0182), B10(0.7190,0.9189,0.0056), 
Bll(0.8991,0.8309,0.0115), B12(0.7461,0.7461,0.8302), B13(0,0,0.7529), B14(0,0.5,0.9161) 



energy metastable structures at a highly affordable cost. 

The reasons why evolutionary algorithms succeed in crystal 
structure prediction have been discussed before Q- As men- 
tioned in Sec. 11, in addition to these, the constrained global 
optimization fixes the molecular connectivity, and brings the 
need for new variation operators (rotational mutation and soft- 
mutation), developed and described here. 

For efficient and reliable polymorph prediction, the popula- 
tion of structures should be sufficiently diverse. A major diffi- 
culty in the prediction of molecular crystals is the large num- 
ber of plausible candidate structures that can have very close 
energies lISTI . Given the complexity of their energy landscape, 
high diversity of the population of the structures is mandatory 
for successful prediction of molecular crystal structures. The 
initial population is particularly important, and it is usually a 
good idea to add a number of random symmetrized structures 
in each generation, to keep sampling of the landscape diverse. 

The presented algorithm provides not only the theoretical 
ground state, but also a number of low-energy metastable 
structures. With inclusion of zero-point energy and entropic 
contributions, such structures may become stable. Even if this 
does not happen, low-energy metastable structures have a rel- 
atively high chance to be synthesized at special conditions. 

While DFTh-D is today's state of the art and its accuracy 
is often sufficient, for some systems (glycine), DFTh-D is too 
crude, and more reliable approaches for computing the energy 
are needed. Under high pressure many of the difficulties dis- 
appear, because the vdW interactions (poorly accounted for by 
today's ab initio methods) become relatively less important. 

Clearly, the quality of the global minimum found by US- 
PEX depends on the accuracy of the theory used for energy 
calculations and structure relaxation. Current levels of the- 
ory can be roughly divided into empirical, semiempirical, and 
ab initio approaches. Accurate empirical force fields are ap- 
propriate for CSP, but reliable parameterizations are hard to 



generate for most molecules. In contrast to empirical force 
fields, ab initio calculations provide a more accurate and rig- 
orous description without parameterization, but the calcula- 
tions are much more time-consuming. In our prediction, we 
adopt the DFTh-D level of theory, which combines "the best 
of both worlds", i.e. an accurate representation of intermolec- 
ular repulsions, hydrogen bonding, electrostatic interactions, 
and vdW dispersions. DFTh-D proved to be reliable for most 
systems, but its results are not fully satisfactory for glycine. 
This shows that further improvements in theoretical calcula- 
tions of intermolecular interactions energies are needed. In 
parallel with the improvement of methods for energy rank- 
ing, there is a need for efficient and reliable algorithms for 
global optimization of the theoretical energy landscape, and 
present work is an important development in this direction. 
In the present paper, we describe the most important ingredi- 
ents of this method, and demonstrate how it enables affordable 
structure prediction for many complex organic and inorganic 
systems at ab initio level. 

In summary, we have presented a new efficient and reliable 
approach for global energy optimization for molecular crystal 
structure prediction. It is based on the evolutionary algorithm 
USPEX extended to molecular crystals by additional varia- 
tion operators and constraints of partially or completely fixed 
molecules. The high efficiency of this method enables fully 
quantum-mechanical structure predictions to be performed at 
an affordable computational cost. Using this method, we suc- 
ceeded in finding the stable structures for systems with various 
rigid molecular shapes (tetrahedral, linear, bent, planar and 
complex molecules), and different bonding situations (vdW 
bonding, ionic, covalent, metallic, weak and strong hydrogen 
bonding, tt-tt stacking, etc). We showed that even large sys- 
tems can be efficiently dealt with by this approach, even at the 
ab initio level of theory. This new approach also has wide ap- 
plicability to inorganic crystals containing clusters and com- 
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plex ions. 
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